Сформируйте модель монетизации игрового приложения.
Многие игры зарабатывают с помощью рекламы. И все они сталкиваются с противоречием:
Аналитик помогает бизнесу выбрать оптимальное время для запуска рекламы. Зная расходы на продвижение игры, он может рассчитать её окупаемость при разных сценариях
Пока создатели игры планируют показывать её на экране выбором постройки. Помогите им не уйти в минус.
Сформулируйте и проверьте статистическую гипотезу относительно представленных данных:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(10, 8)})
import scipy.stats as stats
from scipy import stats as st
import math as mth
import numpy as np
import pandas as pdm
from datetime import datetime,timedelta
from pathlib import Path
import matplotlib.dates as mdates
import math
import cmath
ad_costs = pd.read_csv('/datasets/ad_costs.csv', sep=',')
user_source = pd.read_csv('/datasets/user_source.csv', sep=',')
game_actions = pd.read_csv('/datasets/game_actions.csv', sep=',')
ad_costs.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 28 entries, 0 to 27 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 source 28 non-null object 1 day 28 non-null object 2 cost 28 non-null float64 dtypes: float64(1), object(2) memory usage: 800.0+ bytes
user_source.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 13576 entries, 0 to 13575 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 user_id 13576 non-null object 1 source 13576 non-null object dtypes: object(2) memory usage: 212.2+ KB
game_actions.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 135640 entries, 0 to 135639 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 event_datetime 135640 non-null object 1 event 135640 non-null object 2 building_type 127957 non-null object 3 user_id 135640 non-null object 4 project_type 1866 non-null object dtypes: object(5) memory usage: 5.2+ MB
ad_costs.duplicated().sum()
0
game_actions.duplicated().sum()
1
game_actions = game_actions.drop_duplicates()
user_source.duplicated().sum()
0
ad_costs['day'] = pd.to_datetime(ad_costs['day'])
game_actions.head()
| event_datetime | event | building_type | user_id | project_type | |
|---|---|---|---|---|---|
| 0 | 2020-05-04 00:00:01 | building | assembly_shop | 55e92310-cb8e-4754-b622-597e124b03de | NaN |
| 1 | 2020-05-04 00:00:03 | building | assembly_shop | c07b1c10-f477-44dc-81dc-ec82254b1347 | NaN |
| 2 | 2020-05-04 00:00:16 | building | assembly_shop | 6edd42cc-e753-4ff6-a947-2107cd560710 | NaN |
| 3 | 2020-05-04 00:00:16 | building | assembly_shop | 92c69003-d60a-444a-827f-8cc51bf6bf4c | NaN |
| 4 | 2020-05-04 00:00:35 | building | assembly_shop | cdc6bb92-0ccb-4490-9866-ef142f09139d | NaN |
game_actions['project_type'].unique()
array([nan, 'satellite_orbital_assembly'], dtype=object)
game_actions['building_type'].unique()
array(['assembly_shop', 'spaceport', nan, 'research_center'], dtype=object)
могу предположить что пропуски в building_type являются следствием того что здания просто напросто непостроены, а project_type следствие того что орбитальная станция не построена
game_actions['project_type'] = game_actions['project_type'].fillna('unknown')
game_actions['building_type'] = game_actions['building_type'].fillna('unknown')
game_actions.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 135639 entries, 0 to 135639 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 event_datetime 135639 non-null object 1 event 135639 non-null object 2 building_type 135639 non-null object 3 user_id 135639 non-null object 4 project_type 135639 non-null object dtypes: object(5) memory usage: 6.2+ MB
game_actions['time'] = pd.to_datetime(game_actions['event_datetime'])
game_actions.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 135639 entries, 0 to 135639 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 event_datetime 135639 non-null object 1 event 135639 non-null object 2 building_type 135639 non-null object 3 user_id 135639 non-null object 4 project_type 135639 non-null object 5 time 135639 non-null datetime64[ns] dtypes: datetime64[ns](1), object(5) memory usage: 7.2+ MB
game_actions.sample(5)
| event_datetime | event | building_type | user_id | project_type | time | |
|---|---|---|---|---|---|---|
| 86633 | 2020-05-12 00:07:21 | building | spaceport | 49b88fb6-6920-45f9-8972-d86b6c96f291 | unknown | 2020-05-12 00:07:21 |
| 122169 | 2020-05-17 11:57:48 | building | research_center | 30b6f4eb-9d97-4e88-b2b2-3a2404107e76 | unknown | 2020-05-17 11:57:48 |
| 86890 | 2020-05-12 00:55:30 | building | spaceport | 1338b633-8c0f-4ac3-a66c-7170ce7e3d9c | unknown | 2020-05-12 00:55:30 |
| 50885 | 2020-05-09 01:32:36 | building | spaceport | d07d9d48-e5b4-4e72-996a-26d3fd2c1766 | unknown | 2020-05-09 01:32:36 |
| 24015 | 2020-05-06 16:36:22 | building | assembly_shop | 4306ddc6-daa2-4805-8ab1-da11f3934bd7 | unknown | 2020-05-06 16:36:22 |
game_actions['project_type'].unique()
array(['unknown', 'satellite_orbital_assembly'], dtype=object)
game_actions['building_type'].unique()
array(['assembly_shop', 'spaceport', 'unknown', 'research_center'],
dtype=object)
user_source.head()
| user_id | source | |
|---|---|---|
| 0 | 0001f83c-c6ac-4621-b7f0-8a28b283ac30 | facebook_ads |
| 1 | 00151b4f-ba38-44a8-a650-d7cf130a0105 | yandex_direct |
| 2 | 001aaea6-3d14-43f1-8ca8-7f48820f17aa | youtube_channel_reklama |
| 3 | 001d39dc-366c-4021-9604-6a3b9ff01e25 | instagram_new_adverts |
| 4 | 002f508f-67b6-479f-814b-b05f00d4e995 | facebook_ads |
ad_costs.sample(5)
| source | day | cost | |
|---|---|---|---|
| 19 | yandex_direct | 2020-05-08 | 62.961630 |
| 25 | youtube_channel_reklama | 2020-05-07 | 55.740645 |
| 1 | facebook_ads | 2020-05-04 | 548.354480 |
| 24 | youtube_channel_reklama | 2020-05-06 | 88.506074 |
| 2 | facebook_ads | 2020-05-05 | 260.185754 |
ad_costs.describe()
| cost | |
|---|---|
| count | 28.000000 |
| mean | 271.556321 |
| std | 286.867650 |
| min | 23.314669 |
| 25% | 66.747365 |
| 50% | 160.056443 |
| 75% | 349.034473 |
| max | 969.139394 |
ничего больше интересного в предобработке нет
game_actions['event'].unique()
array(['building', 'finished_stage_1', 'project'], dtype=object)
project_finished = game_actions.query('event == ("project", "finished_stage_1")')
project_finished['count'] = project_finished['user_id'].map(project_finished['user_id'].value_counts())
#project_finished['count'] = project_finished.groupby('user_id')['event'].count().reset_index().transform('count')
project_finished
/tmp/ipykernel_1635/1049838982.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy project_finished['count'] = project_finished['user_id'].map(project_finished['user_id'].value_counts())
| event_datetime | event | building_type | user_id | project_type | time | count | |
|---|---|---|---|---|---|---|---|
| 6659 | 2020-05-04 19:47:29 | finished_stage_1 | unknown | ced7b368-818f-48f6-9461-2346de0892c5 | unknown | 2020-05-04 19:47:29 | 1 |
| 13134 | 2020-05-05 13:22:09 | finished_stage_1 | unknown | 7ef7fc89-2779-46ea-b328-9e5035b83af5 | unknown | 2020-05-05 13:22:09 | 1 |
| 15274 | 2020-05-05 18:54:37 | finished_stage_1 | unknown | 70db22b3-c2f4-43bc-94ea-51c8d2904a29 | unknown | 2020-05-05 18:54:37 | 1 |
| 16284 | 2020-05-05 21:27:29 | finished_stage_1 | unknown | 903fc9ef-ba97-4b12-9d5c-ac8d602fbd8b | unknown | 2020-05-05 21:27:29 | 1 |
| 19650 | 2020-05-06 06:02:22 | finished_stage_1 | unknown | 58e077ba-feb1-4556-a5a0-d96bd04efa39 | unknown | 2020-05-06 06:02:22 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 135632 | 2020-06-04 15:50:38 | finished_stage_1 | unknown | 22cce310-fe10-41a2-941b-9c3d63327fea | unknown | 2020-06-04 15:50:38 | 1 |
| 135633 | 2020-06-04 17:56:14 | finished_stage_1 | unknown | d477dde8-7c22-4f23-9c4f-4ec31a1aa4c8 | unknown | 2020-06-04 17:56:14 | 2 |
| 135636 | 2020-06-05 02:25:12 | finished_stage_1 | unknown | 515c1952-99aa-4bca-a7ea-d0449eb5385a | unknown | 2020-06-05 02:25:12 | 1 |
| 135638 | 2020-06-05 12:12:27 | finished_stage_1 | unknown | 32572adb-900f-4b5d-a453-1eb1e6d88d8b | unknown | 2020-06-05 12:12:27 | 1 |
| 135639 | 2020-06-05 12:32:49 | finished_stage_1 | unknown | f21d179f-1c4b-437e-b9c6-ab1976907195 | unknown | 2020-06-05 12:32:49 | 1 |
7683 rows × 7 columns
project_finished['count'] = project_finished['count'].astype(int)
/tmp/ipykernel_1635/1310609038.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy project_finished['count'] = project_finished['count'].astype(int)
project_finished['count'].unique()
array([1, 2])
project_finished['count'] = ['warrior' if x == 1 else 'builder' for x in project_finished['count']]
/tmp/ipykernel_1635/3843735455.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy project_finished['count'] = ['warrior' if x == 1 else 'builder' for x in project_finished['count']]
project_finished['count'].unique()
array(['warrior', 'builder'], dtype=object)
project_finished.head()
| event_datetime | event | building_type | user_id | project_type | time | count | |
|---|---|---|---|---|---|---|---|
| 6659 | 2020-05-04 19:47:29 | finished_stage_1 | unknown | ced7b368-818f-48f6-9461-2346de0892c5 | unknown | 2020-05-04 19:47:29 | warrior |
| 13134 | 2020-05-05 13:22:09 | finished_stage_1 | unknown | 7ef7fc89-2779-46ea-b328-9e5035b83af5 | unknown | 2020-05-05 13:22:09 | warrior |
| 15274 | 2020-05-05 18:54:37 | finished_stage_1 | unknown | 70db22b3-c2f4-43bc-94ea-51c8d2904a29 | unknown | 2020-05-05 18:54:37 | warrior |
| 16284 | 2020-05-05 21:27:29 | finished_stage_1 | unknown | 903fc9ef-ba97-4b12-9d5c-ac8d602fbd8b | unknown | 2020-05-05 21:27:29 | warrior |
| 19650 | 2020-05-06 06:02:22 | finished_stage_1 | unknown | 58e077ba-feb1-4556-a5a0-d96bd04efa39 | unknown | 2020-05-06 06:02:22 | warrior |
df1 = pd.merge(game_actions, project_finished, how = 'left')
df1 = df1.dropna()
df1
| event_datetime | event | building_type | user_id | project_type | time | count | |
|---|---|---|---|---|---|---|---|
| 6659 | 2020-05-04 19:47:29 | finished_stage_1 | unknown | ced7b368-818f-48f6-9461-2346de0892c5 | unknown | 2020-05-04 19:47:29 | warrior |
| 13134 | 2020-05-05 13:22:09 | finished_stage_1 | unknown | 7ef7fc89-2779-46ea-b328-9e5035b83af5 | unknown | 2020-05-05 13:22:09 | warrior |
| 15274 | 2020-05-05 18:54:37 | finished_stage_1 | unknown | 70db22b3-c2f4-43bc-94ea-51c8d2904a29 | unknown | 2020-05-05 18:54:37 | warrior |
| 16284 | 2020-05-05 21:27:29 | finished_stage_1 | unknown | 903fc9ef-ba97-4b12-9d5c-ac8d602fbd8b | unknown | 2020-05-05 21:27:29 | warrior |
| 19650 | 2020-05-06 06:02:22 | finished_stage_1 | unknown | 58e077ba-feb1-4556-a5a0-d96bd04efa39 | unknown | 2020-05-06 06:02:22 | warrior |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 135631 | 2020-06-04 15:50:38 | finished_stage_1 | unknown | 22cce310-fe10-41a2-941b-9c3d63327fea | unknown | 2020-06-04 15:50:38 | warrior |
| 135632 | 2020-06-04 17:56:14 | finished_stage_1 | unknown | d477dde8-7c22-4f23-9c4f-4ec31a1aa4c8 | unknown | 2020-06-04 17:56:14 | builder |
| 135635 | 2020-06-05 02:25:12 | finished_stage_1 | unknown | 515c1952-99aa-4bca-a7ea-d0449eb5385a | unknown | 2020-06-05 02:25:12 | warrior |
| 135637 | 2020-06-05 12:12:27 | finished_stage_1 | unknown | 32572adb-900f-4b5d-a453-1eb1e6d88d8b | unknown | 2020-06-05 12:12:27 | warrior |
| 135638 | 2020-06-05 12:32:49 | finished_stage_1 | unknown | f21d179f-1c4b-437e-b9c6-ab1976907195 | unknown | 2020-06-05 12:32:49 | warrior |
7683 rows × 7 columns
game_actions = game_actions.rename(columns={'event_datetime': 'day'})
df = pd.merge(game_actions, user_source, how = 'left')
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 135639 entries, 0 to 135638 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 day 135639 non-null object 1 event 135639 non-null object 2 building_type 135639 non-null object 3 user_id 135639 non-null object 4 project_type 135639 non-null object 5 time 135639 non-null datetime64[ns] 6 source 135639 non-null object dtypes: datetime64[ns](1), object(6) memory usage: 8.3+ MB
df.head()
| day | event | building_type | user_id | project_type | time | source | |
|---|---|---|---|---|---|---|---|
| 0 | 2020-05-04 00:00:01 | building | assembly_shop | 55e92310-cb8e-4754-b622-597e124b03de | unknown | 2020-05-04 00:00:01 | youtube_channel_reklama |
| 1 | 2020-05-04 00:00:03 | building | assembly_shop | c07b1c10-f477-44dc-81dc-ec82254b1347 | unknown | 2020-05-04 00:00:03 | facebook_ads |
| 2 | 2020-05-04 00:00:16 | building | assembly_shop | 6edd42cc-e753-4ff6-a947-2107cd560710 | unknown | 2020-05-04 00:00:16 | instagram_new_adverts |
| 3 | 2020-05-04 00:00:16 | building | assembly_shop | 92c69003-d60a-444a-827f-8cc51bf6bf4c | unknown | 2020-05-04 00:00:16 | facebook_ads |
| 4 | 2020-05-04 00:00:35 | building | assembly_shop | cdc6bb92-0ccb-4490-9866-ef142f09139d | unknown | 2020-05-04 00:00:35 | yandex_direct |
df['day'] = df['day'].astype('datetime64')
df['week'] = df['day'].dt.week
/tmp/ipykernel_1635/2457541713.py:1: FutureWarning: Series.dt.weekofyear and Series.dt.week have been deprecated. Please use Series.dt.isocalendar().week instead. df['week'] = df['day'].dt.week
dau = df.groupby('day').agg({'user_id': 'nunique'})
ax_dau = dau.plot()
ax_dau.set_title('Зависимость посещения по дням')
ax_dau.set_xlabel('Дата')
ax_dau.set_ylabel('Посещения')
Text(0, 0.5, 'Посещения')
видим что все меньше пользователей получает новый уровень
wau = df.groupby('week').agg({'user_id': 'nunique'})
wau
| user_id | |
|---|---|
| week | |
| 19 | 13576 |
| 20 | 12121 |
| 21 | 4353 |
| 22 | 521 |
| 23 | 29 |
import plotly.graph_objects as go
pie = df.groupby('event')['user_id'].count().reset_index()
labels = pie.event
values = pie.user_id
fig = go.Figure(data=[go.Pie(labels=labels, values=values,hole=0.3)])
fig.update_traces(hoverinfo='label+percent', textinfo='label+value+percent'
)
fig.update_layout(
title_text="Проекты")
fig.show()
pie = df.groupby('project_type')['user_id'].count().reset_index()
labels = pie.project_type
values = pie.user_id
fig = go.Figure(data=[go.Pie(labels=labels, values=values,hole=0.3)])
fig.update_traces(hoverinfo='label+percent', textinfo='label+value+percent'
)
fig.update_layout(
title_text="Реализованные проекты")
fig.show()
import plotly.express as px
ad_costs.head()
| source | day | cost | |
|---|---|---|---|
| 0 | facebook_ads | 2020-05-03 | 935.882786 |
| 1 | facebook_ads | 2020-05-04 | 548.354480 |
| 2 | facebook_ads | 2020-05-05 | 260.185754 |
| 3 | facebook_ads | 2020-05-06 | 177.982200 |
| 4 | facebook_ads | 2020-05-07 | 111.766796 |
fig = px.line(ad_costs.groupby('day')['cost'].sum().reset_index(), x='day', y='cost')
fig.update_layout(
title_text="график по дням, по которому произошел клик по объявлению (cost)")
fig.show()
ad_costs
| source | day | cost | |
|---|---|---|---|
| 0 | facebook_ads | 2020-05-03 | 935.882786 |
| 1 | facebook_ads | 2020-05-04 | 548.354480 |
| 2 | facebook_ads | 2020-05-05 | 260.185754 |
| 3 | facebook_ads | 2020-05-06 | 177.982200 |
| 4 | facebook_ads | 2020-05-07 | 111.766796 |
| 5 | facebook_ads | 2020-05-08 | 68.009276 |
| 6 | facebook_ads | 2020-05-09 | 38.723350 |
| 7 | instagram_new_adverts | 2020-05-03 | 943.204717 |
| 8 | instagram_new_adverts | 2020-05-04 | 502.925451 |
| 9 | instagram_new_adverts | 2020-05-05 | 313.970984 |
| 10 | instagram_new_adverts | 2020-05-06 | 173.071145 |
| 11 | instagram_new_adverts | 2020-05-07 | 109.915254 |
| 12 | instagram_new_adverts | 2020-05-08 | 71.578739 |
| 13 | instagram_new_adverts | 2020-05-09 | 46.775400 |
| 14 | yandex_direct | 2020-05-03 | 969.139394 |
| 15 | yandex_direct | 2020-05-04 | 554.651494 |
| 16 | yandex_direct | 2020-05-05 | 308.232990 |
| 17 | yandex_direct | 2020-05-06 | 180.917099 |
| 18 | yandex_direct | 2020-05-07 | 114.429338 |
| 19 | yandex_direct | 2020-05-08 | 62.961630 |
| 20 | yandex_direct | 2020-05-09 | 42.779505 |
| 21 | youtube_channel_reklama | 2020-05-03 | 454.224943 |
| 22 | youtube_channel_reklama | 2020-05-04 | 259.073224 |
| 23 | youtube_channel_reklama | 2020-05-05 | 147.041741 |
| 24 | youtube_channel_reklama | 2020-05-06 | 88.506074 |
| 25 | youtube_channel_reklama | 2020-05-07 | 55.740645 |
| 26 | youtube_channel_reklama | 2020-05-08 | 40.217907 |
| 27 | youtube_channel_reklama | 2020-05-09 | 23.314669 |
pie = user_source.groupby('source')['user_id'].count().reset_index()
labels = pie.source
values = pie.user_id
fig = go.Figure(data=[go.Pie(labels=labels, values=values,hole=0.3)])
fig.update_traces(hoverinfo='label+percent', textinfo='label+value+percent'
)
fig.update_layout(
title_text="Переходы")
fig.show()
пользователи которые пришли из разных источников
Статистические гипотезы
Сформулируйте и проверьте статистическую гипотезу относительно представленных данных:
min_event = game_actions.groupby(['user_id','event'])['time'].min().reset_index()
time_event = game_actions.query("event == 'finished_stage_1'")[['user_id','time']]
event_time = min_event.merge(time_event,on = 'user_id', how = 'inner')
event_time.columns = ['user_id','event','start','finish']
event_time['diff_time_event'] = event_time['finish'] - event_time['start']
event_time = event_time.merge(df1, on ='user_id', how = 'inner')
event_time.head(1)
| user_id | event_x | start | finish | diff_time_event | event_datetime | event_y | building_type | project_type | time | count | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 001d39dc-366c-4021-9604-6a3b9ff01e25 | building | 2020-05-05 21:02:05 | 2020-05-12 07:40:47 | 6 days 10:38:42 | 2020-05-12 07:40:47 | finished_stage_1 | unknown | unknown | 2020-05-12 07:40:47 | warrior |
alpha = 0.05
results = st.mannwhitneyu(event_time[event_time['count']=='warrior']['diff_time_event'], event_time[event_time['count']=='builder']['diff_time_event'],alternative = 'two-sided')
pvalue = results.pvalue
print('p-значение: ', pvalue)
if (pvalue < alpha):
print("Отвергаем H0: разница статистически значима")
else:
print("Не получилось отвергнуть H0: вывод о различии сделать нельзя")
p-значение: 2.9185930689384226e-15 Отвергаем H0: разница статистически значима
df1[df1['count']=='warrior']['event_datetime'].head()
6659 2020-05-04 19:47:29 13134 2020-05-05 13:22:09 15274 2020-05-05 18:54:37 16284 2020-05-05 21:27:29 19650 2020-05-06 06:02:22 Name: event_datetime, dtype: object
df1[df1['count']=='warrior']['time'].head()
6659 2020-05-04 19:47:29 13134 2020-05-05 13:22:09 15274 2020-05-05 18:54:37 16284 2020-05-05 21:27:29 19650 2020-05-06 06:02:22 Name: time, dtype: datetime64[ns]
df2 = df[df['source']=='yandex_direct'].set_index('day')['2020-05-03':'2020-05-09'].nunique()
df2
event 3 building_type 4 user_id 4728 project_type 2 time 21872 source 1 week 1 dtype: int64
df3 = df[df['source']=='youtube_channel_reklama'].set_index('day')['2020-05-03':'2020-05-09'].nunique()
df3
event 3 building_type 4 user_id 2630 project_type 2 time 12196 source 1 week 1 dtype: int64
count_warrior_youtube = df.query("event == 'finished_stage_1' and source =='youtube_channel_reklama'").count()
count_warrior_youtube.unique()
array([1159])
count_warrior_yandex = df.query("event == 'finished_stage_1' and source =='yandex_direct'").count()
count_warrior_yandex.unique()
array([2042])
ad_costs['cost'] = ad_costs['cost'].astype(int)
yandex_direct = ad_costs['cost'][ad_costs['source']=='yandex_direct'].sum()
youtube_channel_reklama = ad_costs['cost'][ad_costs['source']=='youtube_channel_reklama'].sum()
yandex_direct
2229
alpha=0.05
purchases = np.array([1159,2042])
leads = np.array([2630, 4728])
p1 = purchases[0] / leads[0]
p2 = purchases[1] / leads[1]
combined = (purchases[0] + purchases[1]) / (leads[0] + leads[1])
difference = p1-p2
z_value = difference / math.sqrt(combined * (1 - combined) * (1 / leads[0] + 1 / leads[1]))
distr = st.norm(0,1)
p_value = (1 - distr.cdf(abs(z_value))) * 2
print('p-значение: ', p_value)
if (p_value < alpha):
print("Отвергаем нулевую гипотезу")
else:
print("Не получилось отвергнуть нулевую гипотезу")
p-значение: 0.46611328936689755 Не получилось отвергнуть нулевую гипотезу